Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a client for getting data from YouTube #1100

Closed
wants to merge 1 commit into from

Conversation

jon-betts
Copy link
Contributor

@jon-betts jon-betts commented Jul 19, 2023

Requires:

Review notes

This PR uses an undocumented API for getting the JSON information. It seems the same data as the HTML scraping method, but avoids:

  • Getting the JSON from the HTML
  • Getting the HTML in the first place
  • Dealing with cookie consent issues in the HTML

This is all being done a bit skin of the teeth. I'd ideally like to try more error scenarios, but I guess we'll catch them later.
The one I've spotted so far is that asking for a translated

Testing notes

You can play about as you'd like, but here is something to get you started.

from dataclasses import asdict
import json

from via.services.youtube_api import YouTubeAPIClient

client = YouTubeAPIClient()

# Get the video info
video = client.get_video_info("1v8u3ua6BPk")
print("Video details:")
print("\t", video.details.title)
print("\t", video.details.author)
print("\t", video.url)

# Try the matching system
caption_track = video.caption.tracks[0]
print("\nSelecting caption track:")
print("\t", caption_track)

# Get the transcript
transcript = client.get_transcript(caption_track)
print("\nTranscript:")
print(json.dumps(asdict(transcript), indent=4))

@jon-betts jon-betts added wip technical enabler Work which only serves to enable other work labels Jul 19, 2023
@jon-betts jon-betts self-assigned this Jul 19, 2023
@jon-betts jon-betts removed the wip label Jul 19, 2023
@jon-betts jon-betts force-pushed the yt-cc-client branch 3 times, most recently from d5ab4ab to 175df95 Compare July 20, 2023 17:11
@jon-betts jon-betts changed the base branch from yt-cc-model to yt-cc-model-2 July 20, 2023 17:13
@seanh seanh self-assigned this Jul 21, 2023
@jon-betts jon-betts requested a review from seanh July 24, 2023 15:15
@jon-betts jon-betts assigned seanh and unassigned seanh Jul 24, 2023
@seanh seanh added the blocked label Jul 28, 2023
@seanh
Copy link
Contributor

seanh commented Jul 28, 2023

Labeling this as blocked just because it's waiting for me to get some more urgent PRs done before I can review this. I promise I'm working my way round to reviewing this ASAP!

@jon-betts jon-betts force-pushed the yt-cc-model-2 branch 4 times, most recently from 7aa0421 to 7830357 Compare August 2, 2023 18:57
@seanh
Copy link
Contributor

seanh commented Aug 15, 2023

Closing in favour of #1162

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
technical enabler Work which only serves to enable other work
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants